Key Event Detection in Video using ASR and Visual Data

نویسندگان

  • Niraj Shrestha
  • Aparna Nurani Venkitasubramanian
  • Marie-Francine Moens
چکیده

Multimedia data grow day by day which makes it necessary to index them automatically and efficiently for fast retrieval, and more precisely to automatically index them with key events. In this paper, we present preliminary work on key event detection in British royal wedding videos using automatic speech recognition (ASR) and visual data. The system first automatically acquires key events of royal weddings from an external corpus such as Wikipedia, and then identifies those events in the ASR data. The system also models name and face alignment to identify the persons involved in the wedding events. We compare the results obtained with the ASR output with results obtained with subtitles. The error is only slightly higher when using ASR output in the detection of key events and their participants in the wedding videos compared to the results obtained with subtitles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Informedia @ TRECVID2010

The Informedia group participated in four tasks this year, including Semantic indexing, Known-item search, Surveillance event detection and Event detection in Internet multimedia pilot. For semantic indexing, except for training traditional SVM classifiers for each high level feature by using different low level features, a kind of cascade classifier was trained which including four layers with...

متن کامل

Multimedia Event Detection – Strong by Multi-Modality Integration

We will present our Multimedia Event Detection system with positive video exemplars (event query is defined by 10 or 100 positive videos), which achieves state-of-the-art performance by designing different fusion strategies for different modalities. First, in visual system, the standard fusion strategy is averaging probability scores obtained by different features. This strategy could achieve r...

متن کامل

PicSOM Experiments in TRECVID 2006

Our experiments in TRECVID 2006 include participation in the shot boundary detection, high-level feature extraction, and search tasks, using a common system framework based on multiple parallel Self-Organizing Maps (SOMs). In the shot boundary detection task we projected feature vectors calculated from successive frames on parallel SOMs and monitored the trajectories to detect the shot boundari...

متن کامل

A Novel Approach to Background Subtraction Using Visual Saliency Map

Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014